Stage 1: Data Ingestion¶
- Define Configuration for Interacting with Kaggle Public API
- Download Kaggle Dataset using Kaggle Credentials and Save Data to
datadirectory - Create a Pipeline that automates data ingestion for any publically available Kaggle Dataset
Configuration¶
In [ ]:
import os
os.chdir('../')
print(f'Current Working Directory: {os.getcwd()}')
Current Working Directory: /mnt/e/Research/dvc/deepgloberoadextraction
In [ ]:
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class DataIngestionConfig:
# Kaggle Credentials from secrets.yaml
username: str
token: str
# config.yaml
download_dir: Path
dataset_id: str
from DeepGlobeRoadExtraction import CONFIG_FILE_PATH, SECRETS_FILE_PATH
from DeepGlobeRoadExtraction.utils.common import read_yaml, create_directories, show_config
class ConfigurationManager:
def __init__(self, config_filepath = CONFIG_FILE_PATH, secrets_filepath = SECRETS_FILE_PATH) -> None:
self.config = read_yaml(config_filepath)
self.secrets = read_yaml(secrets_filepath)
create_directories([self.config.data_ingestion.download_dir])
def get_data_ingestion_config(self) -> DataIngestionConfig:
config = self.config.data_ingestion
secrets = self.secrets.kaggle
cfg = DataIngestionConfig(
download_dir=Path(config.download_dir),
dataset_id=config.dataset_id,
username=secrets.username,
token=secrets.token
)
return cfg
cfg = ConfigurationManager().get_data_ingestion_config()
show_config(cfg)
[2024-06-17 12:50:02,111: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-17 12:50:02,119: INFO: common: yaml file: secrets.yaml loaded successfully] [2024-06-17 12:50:02,119: INFO: common: yaml file: secrets.yaml loaded successfully] [2024-06-17 12:50:02,121: INFO: common: created directory at: data] Configuration: username: adityasharma47 token: dcc850f67e063a3c25586bc8e81edefe download_dir: data dataset_id: balraj98/deepglobe-road-extraction-dataset
Components¶
In [ ]:
import os
import subprocess
import json
from DeepGlobeRoadExtraction import logger
class DataIngestionComponents:
def __init__(self, config: DataIngestionConfig) -> None:
self.config = config
def initialise_kaggle(self):
logger.info(f'---------- Initialising Kaggle Account ----------')
# Set Path for Kaggle Configration File
KAGGLE_CONFIG_DIR = os.path.join(os.path.expandvars('$HOME'), '.kaggle')
KAGGLE_CONFIG_FILE = os.path.join(KAGGLE_CONFIG_DIR, 'kaggle.json')
# Check if kaggle.json already exists and is not empty
if os.path.exists(KAGGLE_CONFIG_FILE) and os.path.getsize(KAGGLE_CONFIG_FILE) > 0:
logger.warning(f'---> Kaggle Account Credentials Found ==> {KAGGLE_CONFIG_FILE}. Remove this file and re-initialse if API token is invalid or has expired.')
return
# Otherwise create .kaggle directory
os.makedirs(KAGGLE_CONFIG_DIR, exist_ok=True)
try:
username = self.config.username
token = self.config.token
api_dict = {'username': username, 'key': token}
# Create a kaggle.json file inside .kaggle folder and add your credentials
with open(KAGGLE_CONFIG_FILE, "w", encoding="utf-8") as f:
json.dump(api_dict, f)
# Change File Permissions
cmd = f"chmod 600 {KAGGLE_CONFIG_FILE}"
output = subprocess.check_output(cmd.split(" "))
output = output.decode(encoding="utf-8")
except Exception as e:
logger.error('Failed to Initialise Kaggle Account!')
raise e
# Download Kaggle Dataset
def download_dataset(self):
from kaggle.api.kaggle_api_extended import KaggleApi
logger.info(f'---------- Downloading Kaggle Dataset: {self.config.dataset_id} ----------')
try:
api = KaggleApi()
api.authenticate()
api.dataset_download_files(
dataset=self.config.dataset_id,
path=self.config.download_dir,
unzip=True,
force=False,
quiet=True
)
logger.info('---> Download Complete!')
except Exception as e:
logger.error('Kaggle dataset download failed!')
raise e
Pipeline¶
In [ ]:
class DataIngestionPipeline:
def __init__(self) -> None:
pass
def main(self):
config = ConfigurationManager().get_data_ingestion_config()
pipeline = DataIngestionComponents(config=config)
pipeline.initialise_kaggle()
pipeline.download_dataset()
DataIngestionPipeline().main()
[2024-06-17 12:38:43,329: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-17 12:38:43,332: INFO: common: yaml file: secrets.yaml loaded successfully] [2024-06-17 12:38:43,333: INFO: common: created directory at: data] [2024-06-17 12:38:43,333: INFO: 4097356871: ---------- Initialising Kaggle Account ----------] [2024-06-17 12:38:43,332: INFO: common: yaml file: secrets.yaml loaded successfully] [2024-06-17 12:38:43,333: INFO: common: created directory at: data] [2024-06-17 12:38:43,333: INFO: 4097356871: ---------- Initialising Kaggle Account ----------] [2024-06-17 12:38:43,338: WARNING: 4097356871: ---> Kaggle Account Credentials Found ==> /home/asharma271/.kaggle/kaggle.json. Remove this file and re-initialse if API token is invalid or has expired.] [2024-06-17 12:38:44,703: INFO: 4097356871: ---------- Downloading Kaggle Dataset: balraj98/deepglobe-road-extraction-dataset ----------] Dataset URL: https://www.kaggle.com/datasets/balraj98/deepglobe-road-extraction-dataset [2024-06-17 12:46:38,191: INFO: 4097356871: ---> Download Complete!]
Stage 2: Data Preparation¶
- Read
metadata.csv - Split training images into training and validation split using random state from
params.yaml - Export updated metadata with split group column as a new metadata.csv file called
metadataV2.csvand save to data directory
In [ ]:
import os
os.chdir('../')
print(f'Current Working Directory: {os.getcwd()}')
Current Working Directory: /Users/geovicco/Coding/Projects/deepgloberoadextraction
Configuration¶
In [ ]:
from dataclasses import dataclass
from pathlib import Path
from typing import List
@dataclass(frozen=True)
class DataPreparationConfig:
# config.yaml
data_directory: Path
metadata_csv: Path
out_metadata_csv: Path
# params.yaml
random_state: int
train_val_split: List[float]
from DeepGlobeRoadExtraction import CONFIG_FILE_PATH, PARAMS_FILE_PATH
from DeepGlobeRoadExtraction.utils.common import read_yaml, show_config
class ConfigurationManager:
def __init__(self, config_filepath=CONFIG_FILE_PATH, params_filepath=PARAMS_FILE_PATH) -> None:
self.config = read_yaml(config_filepath)
self.params = read_yaml(params_filepath)
def get_data_preparation_config(self) -> DataPreparationConfig:
config = self.config.data_preparation
params = self.params
cfg = DataPreparationConfig(
data_directory=Path(config.data_dir),
metadata_csv=Path(config.metadata_csv),
random_state=params.random_state,
train_val_split=params.train_val_split,
out_metadata_csv=Path(config.out_metadata_csv)
)
return cfg
cfg = ConfigurationManager().get_data_preparation_config()
show_config(cfg)
[2024-06-17 12:50:26,904: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-17 12:50:26,907: INFO: common: yaml file: params.yaml loaded successfully] Configuration: data_directory: data metadata_csv: data/metadata.csv out_metadata_csv: data/metadataV2.csv random_state: 26 train_val_split: [0.7, 0.2, 0.1] [2024-06-17 12:50:26,907: INFO: common: yaml file: params.yaml loaded successfully] Configuration: data_directory: data metadata_csv: data/metadata.csv out_metadata_csv: data/metadataV2.csv random_state: 26 train_val_split: [0.7, 0.2, 0.1]
Components¶
In [ ]:
from DeepGlobeRoadExtraction import logger
import pandas as pd
from sklearn.model_selection import train_test_split
class DataPreparationComponents:
def __init__(self, config: DataPreparationConfig) -> None:
self.config = config
def load_metadata(self):
logger.info(f'------------- Loading Metadata -------------')
metadata_df = pd.read_csv(self.config.metadata_csv) # Read Metadata
metadata_df = metadata_df[metadata_df['split']=='train'] # Filter all rows that have 'train' in the 'split' column
metadata_df = metadata_df[['image_id', 'sat_image_path', 'mask_path']] # Keep only 'image_id', 'sat_image_path' and 'mask_path' columns
metadata_df['sat_image_path'] = metadata_df['sat_image_path'].apply(lambda img_pth: os.path.join(self.config.data_directory, img_pth)) # Add data_directory to sat_image_path
metadata_df['mask_path'] = metadata_df['mask_path'].apply(lambda img_pth: os.path.join(self.config.data_directory, img_pth)) # Add data_directory to mask_path
self.metadata = metadata_df
def split_dataset(self):
logger.info(f'------------- Splitting Training Dataset into Train and Validation -------------')
metadata_df = self.metadata
# Shuffle DataFrame
metadata_df = metadata_df.sample(frac=1).reset_index(drop=True)
# Perform split for train / val
train_df, valid_df = train_test_split(metadata_df, train_size=self.config.train_val_split[0], random_state=self.config.random_state)
valid_df, test_df = train_test_split(valid_df, train_size=self.config.train_val_split[1]/(self.config.train_val_split[1]+self.config.train_val_split[2]), random_state=self.config.random_state)
train_df['group'] = 'train'
valid_df['group'] = 'val'
test_df['group'] = 'test'
# Concatenate DataFrames
self.metadata = pd.concat([train_df, valid_df, test_df])
# Export Metadata
self.metadata.to_csv(self.config.out_metadata_csv, index=False)
del train_df, valid_df, test_df, metadata_df
Pipeline¶
In [ ]:
class DataPreparationPipeline:
def __init__(self) -> None:
pass
def main(self):
config = ConfigurationManager().get_data_preparation_config()
pipeline = DataPreparationComponents(config=config)
pipeline.load_metadata()
pipeline.split_dataset()
DataPreparationPipeline().main()
[2024-06-17 12:51:00,006: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-17 12:51:00,010: INFO: common: yaml file: params.yaml loaded successfully] [2024-06-17 12:51:00,011: INFO: 3406445105: ------------- Loading Metadata -------------] [2024-06-17 12:51:00,010: INFO: common: yaml file: params.yaml loaded successfully] [2024-06-17 12:51:00,011: INFO: 3406445105: ------------- Loading Metadata -------------] [2024-06-17 12:51:00,069: INFO: 3406445105: ------------- Splitting Training Dataset into Train and Validation -------------]
Stage 3: Model Training¶
- Read `data/metadataV2.csv'
- Create PyTorch Lightning Data Module
- Input Arguments:
metadataV2.csv,batch_size,augmentations,prefetch_factor,num_workers, andresize_dimensions
- Input Arguments:
- Initialise Model
- Load Weights [Optional]
- Define Callbacks
- Train Model
- Evaluate Model
- Save Model as ONNX, .pth, and State Dict
In [ ]:
import os
os.chdir('../')
print(f'Current Working Directory: {os.getcwd()}')
Current Working Directory: /mnt/e/Research/dvc/deepgloberoadextraction
In [ ]:
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class TrainingConfig:
# config.yaml
models_dir: Path
metadata_csv: Path
logs_dir: Path
# params.yaml
architecture: str
encoder: str
encoder_weights: str
n_classes: int
n_channels: int
epochs: int
lr: float
batch_size: int
device: str
num_workers: int
prefetch_factor: int
resize_dimension: int
checkpoint_path: Path
encoder: str
optimizer: str
loss: str
apply_preprocessing: bool
tune_lr: bool
dev_run: bool
from DeepGlobeRoadExtraction import CONFIG_FILE_PATH, PARAMS_FILE_PATH
from DeepGlobeRoadExtraction.utils.common import read_yaml, show_config
class ConfigurationManager:
def __init__(self, config_filepath=CONFIG_FILE_PATH, params_filepath=PARAMS_FILE_PATH) -> None:
self.config = read_yaml(config_filepath)
self.params = read_yaml(params_filepath)
def get_training_config(self) -> TrainingConfig:
config = self.config.training
params = self.params
cfg = TrainingConfig(
models_dir=Path(config.models_dir),
metadata_csv=Path(config.metadata_csv),
logs_dir=Path(config.logs_dir),
architecture=params.architecture,
encoder=params.encoder,
encoder_weights=params.encoder_weights,
n_classes=params.n_classes,
n_channels=params.n_channels,
epochs=params.epochs,
lr=params.lr,
batch_size=params.batch_size,
device=params.device,
num_workers=params.num_workers,
prefetch_factor=params.prefetch_factor,
resize_dimension=params.resize_dimension,
checkpoint_path=None if params.checkpoint_path == 'None' else Path(params.checkpoint_path),
optimizer=params.optimizer,
loss=params.loss,
apply_preprocessing=params.apply_preprocessing,
tune_lr=params.tune_lr,
dev_run=params.dev_run
)
return cfg
cfg = ConfigurationManager().get_training_config()
show_config(cfg)
[2024-06-18 11:21:15,898: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-18 11:21:15,902: INFO: common: yaml file: params.yaml loaded successfully] Configuration: models_dir: models metadata_csv: data/metadataV2.csv logs_dir: logs architecture: DeepLabV3Plus encoder: resnet50 encoder_weights: imagenet n_classes: 1 n_channels: 3 epochs: 25 lr: 0.004 batch_size: 16 device: auto num_workers: 32 prefetch_factor: 16 resize_dimension: 512 checkpoint_path: logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt optimizer: adamax loss: JaccardLoss apply_preprocessing: False tune_lr: True dev_run: False [2024-06-18 11:21:15,902: INFO: common: yaml file: params.yaml loaded successfully] Configuration: models_dir: models metadata_csv: data/metadataV2.csv logs_dir: logs architecture: DeepLabV3Plus encoder: resnet50 encoder_weights: imagenet n_classes: 1 n_channels: 3 epochs: 25 lr: 0.004 batch_size: 16 device: auto num_workers: 32 prefetch_factor: 16 resize_dimension: 512 checkpoint_path: logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt optimizer: adamax loss: JaccardLoss apply_preprocessing: False tune_lr: True dev_run: False
In [ ]:
from DeepGlobeRoadExtraction import logger
from DeepGlobeRoadExtraction.utils.dataloader import RoadsDataModule, get_training_augmentation, get_preprocessing, get_preprocessing_function
from DeepGlobeRoadExtraction.utils.model import SegmentationModel
import warnings; warnings.filterwarnings("ignore")
import torch
import pytorch_lightning as pl
import matplotlib.pyplot as plt
import numpy as np
import random
torch.set_float32_matmul_precision('medium')
class TrainingComponents:
def __init__(self, config: TrainingConfig) -> None:
self.config = config
def create_dataloaders(self):
logger.info(f'------------- Creating Dataloaders -------------')
if self.config.apply_preprocessing:
logger.info('------------->>> Applying Preprocessing <<<-------------')
self.dm = RoadsDataModule(
metadata_csv=self.config.metadata_csv,
augmentation=get_training_augmentation(),
preprocessing=get_preprocessing(get_preprocessing_function(self.config.encoder, self.config.encoder_weights)),
batch_size=self.config.batch_size,
num_workers=self.config.num_workers,
resize_dimensions=self.config.resize_dimension
)
else:
logger.info('------------->>> Skipping Applying Preprocessing <<<-------------')
self.dm = RoadsDataModule(
metadata_csv=self.config.metadata_csv,
augmentation=get_training_augmentation(),
preprocessing=None,
batch_size=self.config.batch_size,
num_workers=self.config.num_workers,
resize_dimensions=self.config.resize_dimension
)
# Plot sample from the training dataset
@staticmethod
def plot_train_batch(dm, n_samples=4, randomised=True):
dm.setup('fit')
# Get the train dataloader
dataloader = dm.train_dataloader()
if randomised:
# Randomly select a batch of data
x, y = random.choice(list(dataloader))
else:
# Select from first batch of data
x, y = next(iter(dataloader))
# Plot the results
fig, axs = plt.subplots(n_samples, 2, figsize=(10, n_samples*5))
for i in range(n_samples):
# Plot the image
image = x[i].cpu().numpy().transpose(1, 2, 0) # (C, H, W) -> (H, W, C)
# Get Vmin and Vmax as 2nd and 98th percentile
vmin = np.percentile(image, 2)
vmax = np.percentile(image, 98)
axs[i, 0].imshow(image, vmin=vmin, vmax=vmax)
axs[i, 0].axis('off')
if i == 0:
axs[i, 0].set_title('Image')
# Plot the ground truth mask
ground_truth_mask = y[i].cpu().numpy().squeeze() # (1, H, W) -> (H, W)
axs[i, 1].imshow(ground_truth_mask, cmap='binary_r')
axs[i, 1].axis('off')
if i == 0:
axs[i, 1].set_title('Ground Truth Mask')
plt.tight_layout()
plt.show()
def initialise_model(self):
logger.info(f'------------- Inistialising Model: Architecture: {self.config.architecture} | Encoder: {self.config.encoder} | Encoder Weights: {self.config.encoder_weights} -------------')
self.model = SegmentationModel(
architecture=self.config.architecture,
n_channels=self.config.n_channels,
n_classes=self.config.n_classes,
lr=self.config.lr,
encoder=self.config.encoder,
encoder_weights=self.config.encoder_weights,
loss=self.config.loss,
optimizer=self.config.optimizer,
)
def load_checkpoint(self):
if self.config.checkpoint_path is not None and os.path.exists(self.config.checkpoint_path):
logger.info('------------- Loading Checkpoint -------------')
logger.info(f'Loading checkpoint from {self.config.checkpoint_path}')
try:
self.model = SegmentationModel.load_from_checkpoint(self.config.checkpoint_path, hparams_file='params.yaml')
logger.info('Checkpoint loaded successfully')
except Exception as e:
logger.error(f'Failed to load checkpoint: {e}')
def create_callbacks(self):
logger.info('------------- Creating Callbacks -------------')
### Define Checkpoints for Early Stopping, Tensorboard Summary Writer, and Best Checkpoint Saving
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
from pytorch_lightning.loggers import TensorBoardLogger
# Early stopping callback
self.early_stopping = EarlyStopping(
monitor='val_loss', # Metric to monitor
patience=10, # Number of epochs with no improvement after which training will be stopped
verbose=True,
mode='min' # Mode can be 'min' for minimizing the monitored metric or 'max' for maximizing it
)
# Model checkpoint callback
self.checkpoint_callback = ModelCheckpoint(
monitor='val_f1', # Metric to monitor
filename='{epoch:02d}-{val_f1:.2f}', # Filename format
save_top_k=1, # Save the top k models
mode='max', # Mode can be 'min' or 'max'
verbose=True,
dirpath=self.config.logs_dir.joinpath(f'{self.config.architecture}_{self.config.encoder}/checkpoints')
)
# TensorBoard logger
self.tensorboard_logger = TensorBoardLogger(
save_dir=self.config.logs_dir, # Directory to save the logs
name=f"{self.config.architecture}_{self.config.encoder}" # Experiment name
)
from pytorch_lightning.callbacks import LearningRateMonitor
# Learning rate monitor
self.lr_monitor = LearningRateMonitor(logging_interval='epoch')
def tune_lr(self):
if self.config.tune_lr:
from pytorch_lightning.tuner.tuning import Tuner
logger.info('------------- Tunning Learning Rate -------------')
# Define a separate trainer for hyperparameter tuning
self.tuning_trainer = pl.Trainer(
accelerator=self.config.device,
precision="16-mixed",
logger=self.tensorboard_logger,
callbacks=None,
max_epochs=5 # Set this to a low number for faster tuning
)
self.dm.setup('fit')
# Hyperparameter tuning
self.tuner = Tuner(self.tuning_trainer)
self.new_lr = self.tuner.lr_find(self.model, train_dataloaders=self.dm.train_dataloader(), val_dataloaders=self.dm.val_dataloader()).suggestion()
logger.info(f'Suggested learning rate: {self.new_lr}')
else:
logger.info('------------- Skipping Tunning Learning Rate -------------')
def create_trainer(self):
logger.info(f'------------- Training Model: {self.config.architecture} with {self.config.encoder} Encoder -------------')
self.trainer = pl.Trainer(
accelerator=self.config.device,
max_epochs=self.config.epochs,
precision="16-mixed",
logger= self.tensorboard_logger if hasattr(self, 'tensorboard_logger') else None,
callbacks=[self.early_stopping, self.checkpoint_callback, self.lr_monitor],
enable_progress_bar=True,
fast_dev_run=self.config.dev_run,
)
def train(self):
logger.info('------------- Training Started -------------')
self.dm.setup('fit')
if self.config.checkpoint_path is not None and os.path.exists(self.config.checkpoint_path):
logger.info(f'Resuming training from checkpoint: {self.config.checkpoint_path}')
self.trainer.fit(model=self.model, train_dataloaders=self.dm.train_dataloader(), val_dataloaders=self.dm.val_dataloader(), ckpt_path=self.config.checkpoint_path)
else:
self.trainer.fit(model=self.model, train_dataloaders=self.dm.train_dataloader(), val_dataloaders=self.dm.val_dataloader())
logger.info('------------- Training Completed -------------')
In [ ]:
# help(pl.Trainer)
In [ ]:
# class TrainingPipeline:
# def __init__(self) -> None:
# pass
# def main(self):
# config = ConfigurationManager().get_training_config()
# pipeline = TrainingComponents(config=config)
# pipeline.create_dataloaders()
# pipeline.initialise_model()
# pipeline.load_checkpoint()
# pipeline.create_callbacks()
# pipeline.tune_lr()
# pipeline.create_trainer()
# pipeline.train()
# TrainingPipeline().main()
In [ ]:
config = ConfigurationManager().get_training_config()
pipeline = TrainingComponents(config=config)
pipeline.create_dataloaders()
pipeline.initialise_model()
# pipeline.load_checkpoint()
[2024-06-18 11:22:59,844: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-18 11:22:59,847: INFO: common: yaml file: params.yaml loaded successfully] [2024-06-18 11:22:59,848: INFO: 3293270256: ------------- Creating Dataloaders -------------] [2024-06-18 11:22:59,849: INFO: 3293270256: ------------->>> Skipping Applying Preprocessing <<<-------------] [2024-06-18 11:22:59,847: INFO: common: yaml file: params.yaml loaded successfully] [2024-06-18 11:22:59,848: INFO: 3293270256: ------------- Creating Dataloaders -------------] [2024-06-18 11:22:59,849: INFO: 3293270256: ------------->>> Skipping Applying Preprocessing <<<-------------] [2024-06-18 11:22:59,862: INFO: 3293270256: ------------- Inistialising Model: Architecture: DeepLabV3Plus | Encoder: resnet50 | Encoder Weights: imagenet -------------]
In [ ]:
pipeline.plot_train_batch(dm=pipeline.dm, randomised=False)
In [ ]:
pipeline.create_callbacks()
pipeline.tune_lr()
pipeline.create_trainer()
pipeline.train()
[2024-06-18 11:23:05,637: INFO: 3293270256: ------------- Creating Callbacks -------------] [2024-06-18 11:23:05,650: INFO: 3293270256: ------------- Tunning Learning Rate -------------] [2024-06-18 11:23:06,454: INFO: rank_zero: Using 16bit Automatic Mixed Precision (AMP)] [2024-06-18 11:23:06,561: INFO: rank_zero: GPU available: True (cuda), used: True] [2024-06-18 11:23:06,562: INFO: rank_zero: TPU available: False, using: 0 TPU cores] [2024-06-18 11:23:06,562: INFO: rank_zero: HPU available: False, using: 0 HPUs] [2024-06-18 11:23:07,364: INFO: cuda: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]]
Finding best initial lr: 0%| | 0/100 [00:00<?, ?it/s]
[2024-06-18 11:24:09,721: INFO: rank_zero: `Trainer.fit` stopped: `max_steps=100` reached.] [2024-06-18 11:24:09,723: INFO: lr_finder: Learning rate set to 0.002754228703338169] [2024-06-18 11:24:09,723: INFO: rank_zero: Restoring states from the checkpoint path at /mnt/e/Research/dvc/deepgloberoadextraction/.lr_find_3692414d-ff7e-4620-861e-594a563a1788.ckpt] [2024-06-18 11:24:10,219: INFO: rank_zero: Restored all states from the checkpoint at /mnt/e/Research/dvc/deepgloberoadextraction/.lr_find_3692414d-ff7e-4620-861e-594a563a1788.ckpt] [2024-06-18 11:24:11,448: INFO: 3293270256: Suggested learning rate: 0.002754228703338169] [2024-06-18 11:24:11,449: INFO: 3293270256: ------------- Training Model: DeepLabV3Plus with resnet50 Encoder -------------] [2024-06-18 11:24:11,464: INFO: rank_zero: Using 16bit Automatic Mixed Precision (AMP)] [2024-06-18 11:24:11,567: INFO: rank_zero: GPU available: True (cuda), used: True] [2024-06-18 11:24:11,567: INFO: rank_zero: TPU available: False, using: 0 TPU cores] [2024-06-18 11:24:11,568: INFO: rank_zero: HPU available: False, using: 0 HPUs] [2024-06-18 11:24:11,569: INFO: 3293270256: ------------- Training Started -------------] [2024-06-18 11:24:11,572: INFO: 3293270256: Resuming training from checkpoint: logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt] [2024-06-18 11:24:11,576: INFO: rank_zero: Restoring states from the checkpoint path at logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt] [2024-06-18 11:24:12,591: INFO: cuda: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]] [2024-06-18 11:24:12,658: INFO: model_summary: | Name | Type | Params | Mode -------------------------------------------------- 0 | model | DeepLabV3Plus | 26.7 M | train 1 | loss_fn | JaccardLoss | 0 | train -------------------------------------------------- 26.7 M Trainable params 0 Non-trainable params 26.7 M Total params 106.710 Total estimated model params size (MB)] [2024-06-18 11:24:12,745: INFO: rank_zero: Restored all states from the checkpoint at logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt]
Sanity Checking: | | 0/? [00:00<?, ?it/s]
Training: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
[2024-06-18 11:26:30,986: INFO: rank_zero: Epoch 23, global step 6528: 'val_f1' was not in top 1]
Validation: | | 0/? [00:00<?, ?it/s]
[2024-06-18 11:28:34,612: INFO: early_stopping: Metric val_loss improved by 0.001 >= min_delta = 0.0. New best score: 0.447] [2024-06-18 11:28:34,613: INFO: rank_zero: Epoch 24, global step 6800: 'val_f1' was not in top 1] [2024-06-18 11:28:34,620: INFO: rank_zero: `Trainer.fit` stopped: `max_epochs=25` reached.] [2024-06-18 11:28:34,613: INFO: rank_zero: Epoch 24, global step 6800: 'val_f1' was not in top 1] [2024-06-18 11:28:34,620: INFO: rank_zero: `Trainer.fit` stopped: `max_epochs=25` reached.] [2024-06-18 11:28:35,051: INFO: 3293270256: ------------- Training Completed -------------]
In [ ]:
# Evaluation
# Setup and test
pipeline.dm.setup('test')
try:
test_results = pipeline.trainer.test(dataloaders=pipeline.dm.test_dataloader())
# Extract relevant metrics from test_results
# Assuming test_results is a list of dictionaries and contains y_true and y_pred
metrics = test_results[0] if test_results else {}
# Save metrics to a file
# save_json(pipeline.config.metrics_filepath, metrics) # Save the metrics to a file
except Exception as e:
print(f"An error occurred: {e}")
[2024-06-18 10:14:57,640: INFO: rank_zero: Restoring states from the checkpoint path at /mnt/e/Research/dvc/deepgloberoadextraction/logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt] [2024-06-18 10:14:58,677: INFO: cuda: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]] [2024-06-18 10:14:58,794: INFO: rank_zero: Loaded model weights from the checkpoint at /mnt/e/Research/dvc/deepgloberoadextraction/logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt]
Testing: | | 0/? [00:00<?, ?it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Test metric DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
test_f1 0.7152367830276489
test_f2 0.7285547256469727
test_iou 0.5573932528495789
test_loss 0.448394775390625
test_precision 0.6897551417350769
test_recall 0.7359154224395752
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
In [ ]:
import torch
import matplotlib.pyplot as plt
import numpy as np
import random
pipeline.dm.setup('test')
# Get the test dataloader
test_dataloader = pipeline.dm.test_dataloader()
test_dataloader_list = list(test_dataloader)
# Randomly select a batch of data
x, y = random.choice(test_dataloader_list)
# Put the model in evaluation mode
pipeline.model.eval()
# Disable gradients for this step
with torch.no_grad():
# Pass the data through the model
y_hat = pipeline.model(x)#.to(pipeline.config.device)).cpu()
# Plot the results
fig, axs = plt.subplots(4, 4, figsize=(20, 20)) # Increase the number of columns to 3
for i in range(4):
# Plot the image
image = np.transpose(x[i][:3, :, :]).squeeze()
axs[i, 0].imshow(image)
axs[i, 0].axis('off')
if i == 0:
axs[i, 0].set_title('Image')
# Plot the ground truth mask
ground_truth_mask = np.transpose(y[i]).squeeze()
axs[i, 1].imshow(ground_truth_mask, cmap='binary_r')
axs[i, 1].axis('off')
if i == 0:
axs[i, 1].set_title('Ground Truth Mask')
# Plot the predicted mask - with and without thresholding
predicted_mask = torch.sigmoid(np.transpose(y_hat[i])).squeeze()
axs[i, 2].imshow(predicted_mask, cmap='RdYlGn')
axs[i, 2].axis('off')
if i == 0:
axs[i, 2].set_title('Predicted Mask (Raw)')
axs[i, 3].imshow(predicted_mask > 0.5, cmap='binary_r')
axs[i, 3].axis('off')
if i == 0:
axs[i, 3].set_title('Predicted Mask (Thresholded)')
# Remove empty subplots
for j in range(4, 4):
for i in range(4):
axs[i, j].axis('off')
plt.tight_layout()
plt.show()
Stage 4: Evaluation¶
Load Dataloader
- Initialise and Load Model Checkpoint
- Create Data Loader
- Evalute on Test Dataloader
- Save Metrics as JSON
- Plot Random Batch from Test Dataset with Prediction inside
plotsdirectory
In [ ]:
import os
os.chdir('../')
print(f'Current Working Directory: {os.getcwd()}')
Current Working Directory: /mnt/e/Research/dvc/deepgloberoadextraction
In [ ]:
from pathlib import Path
from dataclasses import dataclass
@dataclass(frozen=True)
class EvalutationConfig:
# config.yaml
models_dir: Path
results_dir: Path
metrics_filepath: Path
metadata_csv: Path
# params.yaml
batch_size: int
num_workers: int
resize_dimension: int
device: str
model_path: Path
save_predictions: bool
f1_threshold: float
from DeepGlobeRoadExtraction.utils.common import read_yaml, create_directories, show_config
from DeepGlobeRoadExtraction import CONFIG_FILE_PATH, PARAMS_FILE_PATH
class ConfigurationManager:
def __init__(self, config_filepath: Path = CONFIG_FILE_PATH, params_filepath: Path = PARAMS_FILE_PATH):
self.config = read_yaml(config_filepath)
self.params = read_yaml(params_filepath)
def get_evaluation_config(self) -> EvalutationConfig:
config = self.config.evaluation
params = self.params
plots_dir = Path(config.results_dir) / "plots"
create_directories([config.models_dir, config.results_dir, plots_dir])
cfg = EvalutationConfig(
models_dir=config.models_dir,
results_dir=config.results_dir,
metrics_filepath=config.metrics_filepath,
metadata_csv=config.metadata_csv,
batch_size=params.batch_size,
num_workers=params.num_workers,
resize_dimension=params.resize_dimension,
device=params.device,
model_path=params.model_path,
save_predictions=params.save_predictions,
f1_threshold=params.f1_threshold
)
return cfg
cfg = ConfigurationManager().get_evaluation_config()
show_config(cfg)
[2024-06-18 13:32:38,039: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-18 13:32:38,043: INFO: common: yaml file: params.yaml loaded successfully] [2024-06-18 13:32:38,044: INFO: common: created directory at: models] [2024-06-18 13:32:38,045: INFO: common: created directory at: results] [2024-06-18 13:32:38,047: INFO: common: created directory at: results/plots] Configuration: models_dir: models results_dir: results metrics_filepath: results/metrics.json metadata_csv: data/metadataV2.csv batch_size: 16 num_workers: 32 resize_dimension: 512 device: auto model_path: logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt save_predictions: True f1_threshold: 0.7 [2024-06-18 13:32:38,043: INFO: common: yaml file: params.yaml loaded successfully] [2024-06-18 13:32:38,044: INFO: common: created directory at: models] [2024-06-18 13:32:38,045: INFO: common: created directory at: results] [2024-06-18 13:32:38,047: INFO: common: created directory at: results/plots] Configuration: models_dir: models results_dir: results metrics_filepath: results/metrics.json metadata_csv: data/metadataV2.csv batch_size: 16 num_workers: 32 resize_dimension: 512 device: auto model_path: logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt save_predictions: True f1_threshold: 0.7
In [ ]:
### Components
from DeepGlobeRoadExtraction import logger
from DeepGlobeRoadExtraction.utils.dataloader import RoadsDataModule
from DeepGlobeRoadExtraction.utils.model import SegmentationModel
from DeepGlobeRoadExtraction.utils.common import save_json
import warnings; warnings.filterwarnings("ignore")
import torch
import pytorch_lightning as pl
import matplotlib.pyplot as plt
import numpy as np
torch.set_float32_matmul_precision('medium')
class EvalutationComponents:
def __init__(self, config: EvalutationConfig):
self.config = config
def create_dataloaders(self):
self.dm = RoadsDataModule(
metadata_csv=self.config.metadata_csv,
augmentation=None,
preprocessing=None,
batch_size=self.config.batch_size,
num_workers=self.config.num_workers,
resize_dimensions=self.config.resize_dimension
)
def load_model(self):
if self.config.model_path is not None and os.path.exists(self.config.model_path):
logger.info('------------- Loading Checkpoint -------------')
logger.info(f'Loading checkpoint from {self.config.model_path}')
try:
self.model = SegmentationModel.load_from_checkpoint(self.config.model_path)
logger.info('Checkpoint loaded successfully')
except Exception as e:
logger.error(f'Failed to load checkpoint: {e}')
def create_trainer(self):
logger.info(f'------------- Creating Trainer -------------')
self.trainer = pl.Trainer(
accelerator=self.config.device,
# max_epochs=self.config.epochs,
inference_mode=True,
precision="16-mixed",
logger=self.tensorboard_logger if hasattr(self, 'tensorboard_logger') else None,
callbacks=None,
enable_progress_bar=True,
)
def evaluate(self):
logger.info('------------- Evaluating Model -------------')
self.dm.setup('test')
try:
test_results = self.trainer.test(model=self.model, dataloaders=self.dm.test_dataloader())
# Extract relevant metrics from test_results
# Assuming test_results is a list of dictionaries and contains y_true and y_pred
self.metrics = test_results[0] if test_results else {}
# Save metrics to a file
logger.info(f'Saving metrics to: {self.config.metrics_filepath}')
save_json(Path(self.config.metrics_filepath), self.metrics) # Save the metrics to a file
# Save Model as ONNX
if self.metrics['test_f1'] > self.config.f1_threshold:
logger.info(f"Test F1-Score ({self.metrics['test_f1']:.3f}) above threshold ({self.config.f1_threshold}), saving model as ONNX...")
save_path = Path(self.config.models_dir).joinpath(f"{Path(self.config.model_path).parent.parent.name}_test_f1_{self.metrics['test_f1']:.3f}.onnx")
logger.info(f'Saving model to: {save_path}')
test_dataloader = self.dm.test_dataloader()
input_sample, _ = next(iter(test_dataloader))
input_sample = input_sample[0].unsqueeze(0)
self.model.to_onnx(save_path, input_sample=input_sample, export_params=True)
except Exception as e:
print(f"An error occurred: {e}")
def save_predictions(self):
if self.config.save_predictions:
logger.info('------------- Saving Predictions -------------')
# Get the test dataloader
test_dataloader = self.dm.test_dataloader()
# Randomly select a batch of data
x, y = next(iter(test_dataloader))
# Put the model in evaluation mode
self.model.eval()
# Disable gradients for this step
with torch.no_grad():
# Pass the data through the model
y_hat = self.model(x)
# Plot the results
_, axs = plt.subplots(4, 4, figsize=(20, 20)) # Increase the number of columns to 3
for i in range(4):
# Plot the image
image = np.transpose(x[i][:3, :, :]).squeeze()
axs[i, 0].imshow(image)
axs[i, 0].axis('off')
if i == 0:
axs[i, 0].set_title('Image')
# Plot the ground truth mask
ground_truth_mask = np.transpose(y[i]).squeeze()
axs[i, 1].imshow(ground_truth_mask, cmap='binary_r')
axs[i, 1].axis('off')
if i == 0:
axs[i, 1].set_title('Ground Truth Mask')
# Plot the predicted mask - with and without thresholding
predicted_mask = torch.sigmoid(np.transpose(y_hat[i])).squeeze()
axs[i, 2].imshow(predicted_mask, cmap='RdYlGn')
axs[i, 2].axis('off')
if i == 0:
axs[i, 2].set_title('Predicted Mask (Raw)')
axs[i, 3].imshow(predicted_mask > 0.5, cmap='binary_r')
axs[i, 3].axis('off')
if i == 0:
axs[i, 3].set_title('Predicted Mask (Thresholded)')
# Remove empty subplots
for j in range(4, 4):
for i in range(4):
axs[i, j].axis('off')
plt.tight_layout()
# Save as PNG
save_path = Path(self.config.results_dir).joinpath(f"plots/{Path(self.config.model_path).parent.parent.name}_test_f1_{self.metrics['test_f1']:.3f}.png")
logger.info(f'Saving predictions to: {save_path}')
plt.savefig(save_path, bbox_inches='tight', dpi=200)
plt.show()
In [ ]:
class EvaluationPipeline:
def __init__(self) -> None:
pass
def main(self):
config = ConfigurationManager().get_evaluation_config()
pipeline = EvalutationComponents(config=config)
pipeline.create_dataloaders()
pipeline.load_model()
pipeline.create_trainer()
pipeline.evaluate()
pipeline.save_predictions()
pipeline = EvaluationPipeline()
pipeline.main()
[2024-06-18 13:34:29,191: INFO: common: yaml file: config.yaml loaded successfully] [2024-06-18 13:34:29,194: INFO: common: yaml file: params.yaml loaded successfully] [2024-06-18 13:34:29,196: INFO: common: created directory at: models] [2024-06-18 13:34:29,197: INFO: common: created directory at: results] [2024-06-18 13:34:29,200: INFO: common: created directory at: results/plots] [2024-06-18 13:34:29,210: INFO: 3517949174: ------------- Loading Checkpoint -------------] [2024-06-18 13:34:29,211: INFO: 3517949174: Loading checkpoint from logs/DeepLabV3Plus_resnet50/checkpoints/epoch=22-val_f1=0.72.ckpt] [2024-06-18 13:34:32,587: INFO: 3517949174: Checkpoint loaded successfully] [2024-06-18 13:34:32,588: INFO: 3517949174: ------------- Creating Trainer -------------] [2024-06-18 13:34:33,464: INFO: rank_zero: Using 16bit Automatic Mixed Precision (AMP)] [2024-06-18 13:34:33,570: INFO: rank_zero: GPU available: True (cuda), used: True] [2024-06-18 13:34:33,571: INFO: rank_zero: TPU available: False, using: 0 TPU cores] [2024-06-18 13:34:33,571: INFO: rank_zero: HPU available: False, using: 0 HPUs] [2024-06-18 13:34:33,583: INFO: 3517949174: ------------- Evaluating Model -------------] [2024-06-18 13:34:34,275: INFO: cuda: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]]
Testing: | | 0/? [00:00<?, ?it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Test metric DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
test_f1 0.7152367830276489
test_f2 0.7285547256469727
test_iou 0.5573932528495789
test_loss 0.448394775390625
test_precision 0.6897551417350769
test_recall 0.7359154224395752
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[2024-06-18 13:34:45,214: INFO: 3517949174: Saving metrics to: results/metrics.json]
[2024-06-18 13:34:45,216: INFO: common: json file saved at: results/metrics.json]
[2024-06-18 13:34:45,216: INFO: 3517949174: Test F1-Score (0.715) above threshold (0.7), saving model as ONNX...]
[2024-06-18 13:34:45,217: INFO: 3517949174: Saving model to: models/DeepLabV3Plus_resnet50_test_f1_0.715.onnx]
[2024-06-18 13:34:51,349: INFO: 3517949174: ------------- Saving Predictions -------------]
[2024-06-18 13:35:00,152: INFO: 3517949174: Saving predictions to: results/plots/DeepLabV3Plus_resnet50_test_f1_0.715.png]
In [ ]: